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Characterizing the community structure of complex networks is a key challenge in many scientific 
fields. Very diverse algorithms and methods have been proposed to this end, many working reason- 
ably well in specific situations. However, no consensus has emerged on which of these methods is 
the best to use in practice. In part, this is due to the fact that testing their performance requires the 
generation of a comprehensive, standard set of synthetic benchmarks, a goal not yet fully achieved. 
Here, we present a type of benchmark that we call "closed", in which an initial network of known 
community structure is progressively converted into a second network whose communities are also 
known. This approach differs from all previously published ones, in which networks evolve toward 
randomness. The use of this type of benchmark allows us to monitor the transformation of the 
community structure of a network. Moreover, we can predict the optimal behavior of the variation 
of information, a measure of the quality of the partitions obtained, at any moment of the process. 
This enables us in many cases to determine the best partition among those suggested by different 
algorithms. Also, since any network can be used as a starting point, extensive studies and com- 
parisons can be performed using a heterogeneous set of structures, including random ones. These 
properties make our benchmarks a general standard for comparing community detection algorithms. 
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I. INTRODUCTION 

Network analysis offers a powerful approach to solve 
problems in many scientific fields, including physics, bi- 
ology, and sociology [IH3]. Community structure is a 
significant property of these networks. A community 
can be loosely defined as a set of nodes that are more 
densely connected among themselves than with the rest 
of the network. The importance of community structure 
characterization derives from the fact that all nodes in a 
community are expected to share common attributes, fea- 
tures, or functional connections (reviewed in [S])- Many 
algorithms and methods have been proposed for extract- 
ing the optimal partition of a network into communi- 
ties. While some of them try to improve a global quality 
function such as its Modularity [5] or Surprise [7], oth- 
ers search for the optimal partition by minimizing the 
compression of the information that best describes the 
network [5], minimizing the Hamiltonian of a Potts-like 
spin model that represents the graph [9] , or deducing the 
maximum-likelihood model that best fits the structure of 
the network |10) . to name just a few examples. However, 
none of these algorithms achieves maximal results in all 
situations. Their performance varies greatly, depending 
on the topological parameters of the analyzed network 
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In order to compare the performance of community 
detection algorithms, several benchmarks have been pro- 
posed. The first ones were based on the planted one- 
partition model [12]. The most popular among them is 
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the Girvan and Newman (GN) benchmark |13j . in which 
a network of 128 nodes is divided into four communities 
of equal size where each node is connected with 16 other 
members of its own community. This starting graph can 
then be progressively degraded by replacing links within 
communities with links between them, keeping constant 
the average node degree. The relaxed caveman (RC) 
benchmarks HH [15] are similar in concept. In them, 
the starting network is formed by a set of cliques of vari- 
able sizes, and a degradation process identical to that 
already described for the GN benchmark is performed. 
Notice that GN and RC communities are, by definition, 
Erdos-Renyi subgraphs [TB] in which, all throughout the 
degradation process, each pair of nodes is linked with 
the same probability p. This makes those benchmarks 
rather inappropriate for representing real- world networks 
since the latter exhibit much more heterogeneous degree 
distributions fTTJ [TH]. With this idea in mind, Lanci- 
chinetti, Fortunato, and Radicchi developed a novel type 
of benchmark, called LFR [T!5] , in which both the sizes of 
the communities and the distribution of node degrees are 
adjusted to follow power laws. In LFR benchmarks, the 
fraction of links fi that a node shares with nodes in other 
communities is tunable. Increasing [i (often called the 
" mixing parameter" ) generates an analogous behavior to 
that of the degradation process described for GN and RC 
benchmarks, i.e., the proportion of intercommunity links 
grows and the original communities gradually disappear. 
We refer to all of these benchmarks (GN, RC, and LFR) 
as "open", given that the final outcome is "open-ended" 
(i.e., the precise final community structure of the network 
is undetermined). 

In this paper, we describe in detail a novel type of 
benchmark (referred to as "closed") that is based on the 
conversion of a network of known community structure 
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into a second network whose communities are also known. 
We already introduced the concept of a closed benchmark 
in a previous work [7], and we showed how this type of 
benchmark can be successfully used to compare commu- 
nity detection algorithms. Here, we explain it in detail, 
give some examples of its performance, and discuss its 
potential and the significant advantages it presents over 
the aforementioned open benchmarks. We show that the 
guided evolution of the networks to a closed end enables 
us to accurately monitor the transformation progress and 
to evaluate the goodness of a partition at any moment of 
the process. 



II. FEATURES OF THE CLOSED 
BENCHMARKS 

The main concept behind the closed benchmarks is 
the directed conversion of a network into another one 
by means of edge rewiring. The starting point is a net- 
work whose community structure is known a priori. Any 
type of graph and community structure is valid as an 
initial network. The algorithm then generates a second, 
"final" network. The initial and final networks are pre- 
cisely related. The community structure of the final net- 
work is identical to the initial one, but the labels of the 
nodes are randomly mapped from the former to the lat- 
ter. Converting the initial network into the final one 
involves rewiring links in a directed manner, a process 
depicted in Fig. [T] The details of the procedure are as 
follows: 

1. Links present in both the initial and the final net- 
works will not be rewired. 

2. At each step, one of the rewirable links is removed 
and subsequently a new link is added between two 
nodes connected only in the final network. Conver- 
sion (C) is defined as the percentage of rewirable 
links modified at a particular point of the process 
of converting the initial network structure into the 
final one. 

3. At any point of this conversion process, the network 
can be saved for later analyses. Therefore, a wide 
set of intermediate structures to test the behavior of 
community detection algorithms can be obtained. 

4. The process stops when the final structure is 
reached. 

A significant feature of the closed benchmarks is that, 
during the conversion process and because of the directed 
rewiring of the links, we are approaching the final struc- 
ture at the same rate that we are leaving the initial one. 
Calling D the distance between both networks, we can 
assert that the structure at a distance x from the start 
is also at a distance D — x from the end of the bench- 
mark. This fact, together with the identical topology of 




FIG. 1: Transformation process in a closed benchmark. In 
this case, the starting network is the GN benchmark. Links 
are progressively rewired from the initial (C = 0%) to the fi- 
nal network (C = 100%). Nodes color is defined by the initial 
community to which they belong, whereas their shape corre- 
sponds to the final community in which they are contained. 



both ends, produces a set of structures that is symmet- 
rical about the 50% conversion point. That is, when C 
= 50%, the structure of the network is, on average, at 
the same distance from both the initial and the final net- 
works. Given these patterns of network evolution, we can 
assume that its community structure undergoes a similar 
behavior. As we will describe below, this behavior is cen- 
tral to the evaluation of partitions in closed benchmarks. 
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Any benchmark is associated with one or several mea- 
sures of performance. In the case of clustering com- 
parison, several methods, based on counting pairs, clus- 
ter matching, or information-theory based indexes, have 
been developed (reviewed in [5D]). Among the lat- 
ter type, the variation of information (V) |2T] is an 
information-based distance useful for measuring the dis- 
similarity between two partitions, A and B (Vab)- In 
our context, we consider that it has clear advantages 
over other criteria, especially its metric nature. This 
property implies that V is positive-definite, a symmet- 
ric distance (which is a highly desirable property when 
comparing clusterings), and, more important for our pur- 
poses, it satisfies the triangle inequality [2l] . This last 
fact turns out to be very useful for closed benchmarks 
evaluation. In these benchmarks, we have two known 
community structures, those of the initial (/) and final 
(F) networks. Moreover, the method generates a set of 
intermediate, estimated structures (E) whose communi- 
ties can also be determined. We can deduce from the V 
triangle inequality the following formula: 

Vab + V BC > Vac (1) 

Hence, the sum of Vie an d Vef is lower bounded by 
Vif • which is constant, given that the partitions of the 
initial and final networks are fixed. If the rewiring of the 
network has not yet started, the optimal estimated par- 
tition is the same as the initial one, I = E, and therefore 
Vie = and Vef = Vif, satisfying the equality in Eq. [T] 
When the conversion starts, and because the network ap- 
proaches the final structure at the same rate that it leaves 
the initial one, Vie should increase as much as Vef de- 
creases. Therefore, unless the structure of the network 
becomes very different from both the initial and final 
structures along the conversion process (e.g., as described 
in the next paragraph), this should make the equality 
Vie + Vef = Vif true all along the conversion of the 
initial into the final structure. A significant deduction is 
that if, for a given estimated partition E, the sum of Vie 
and Vef deviates from the constant value Vif > then E 
may not be the optimal partition [7]. Thus, deviation 
from the expected Vif value may indicate a suboptimal 
performance of a given algorithm. 

If third-party structures, very different from the initial 
and final ones, are formed along the conversion process, 
we can find Vie + Vef > Vif even if the partition is 
optimal. This can be illustrated assuming that the inter- 
mediate structure becomes fully random. Two situations 
are then possible, depending on the density of links in the 
graph. If, at some point of the rewiring, the intermediate 
structure becomes a single community containing all the 
nodes -as expected in a random graph with a high den- 
sity of links- then Vie = H(I) and Vef = H(F), where 
H(I) and H(F) are the entropies of the initial and final 
partitions. Given that V IF = H(I) + H(F) - 2M(I, F), 
where M(I, F) is the mutual information between the 
initial and final partitions, we have that Vie + Vef must 
be somewhat larger than Vi f ■ This derives from the fact 



that M(I, F) = only if / and F are independent, which 
is not the case here. On the other hand, if the density 
of links is low and the network is randomized, the com- 
munity structure may approach a situation in which each 
node is isolated in a different community. If this is true, 
it can be shown that Vie — logN — H(I) and Vef — 
logN — H(F), where N is the total number of nodes. In 
this case, we will find Vie + Vef ^> Vif- Thus, if an al- 
gorithm is performing perfectly well (Vie + Vef — Vif) 
until a certain conversion percentage, and if, when con- 
version progresses further, we find Vie + Vef > Vtf ; 
this may be due to two reasons: (i) a bad performance of 
the algorithm with poorly defined community structures, 
(ii), the emergence of a third-party, potentially random, 
community structure. This interesting situation will be 
illustrated in a particular case below. 



III. TESTS 
A. Configuration 

As mentioned above, the particular features of a net- 
work can greatly influence the ability of a given algo- 
rithm to detect its community structure. For this rea- 
son, wc performed tests on computer-generated networks 
that varied in size, node degree distribution, number of 
communities, and also community sizes. This last pa- 
rameter has been shown to be crucial in community de- 
tection [IT] . There are two main reasons for the signif- 
icant effect of community size variation. First, networks 
presenting a skewed distribution of community sizes are 
more rapidly degraded than those with equally sized com- 
munities because of the quick destruction of small clus- 
ters. Second, a skewed distribution may greatly affect 
the performance of particular algorithms. For example, 
any algorithm maximizing a popular global measure for 
community detection, Newman and Girvan's modularity 
(Q), will have trouble detecting small communities, given 
that Q is affected by a resolution limit |22) . 

A suitable way to measure and compare the distribu- 
tion of community sizes is using Pielou's index (P), which 
quantifies how similar are the groups into which a system 
is divided. This index takes a value of 1 for equal-sized 
groups and decreases with increasing size variance |23j . 
In this study, we chose as starting points four different 
synthetic networks with different P values that corre- 
spond to those of already published open benchmarks. 
We will name them according to the following conven- 
tion: (i) Girvan- Newman (GN) [13] : already mentioned 
above. A network of 128 nodes is divided into four com- 
munities of equal size (P = I). Nodes are connected 
only with members of their own community with an av- 
erage degree of 16. (ii) Lancichinetti-Fortunato-Radicchi 
network with small communities (LFRg) (TTJ [T9] : a net- 
work of 5000 nodes. The average degree of the nodes 
is 20, their maximum degree is 50, the exponent of the 
degree distribution is -2, and the exponent of the commu- 
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FIG. 2: Graphical view of the adjacency matrices of the four initial networks used in the tests. Nodes are ordered according 
to the communities to which they belong. Black indicates that two nodes are connected. Differences in relative community 
sizes are evident. In the GN network, the nodes of the four equal-sized communities are sparsely connected. The groups in the 
LFRg are also sparse. However, there are so many of them (195) that visualization is difficult at this resolution level. The RC 
initial networks (RC75 and RC50) are formed by 16 cliques and the distribution of community sizes is highly skewed, especially 
in RC50, where a single community dominates the network. 



nity sizes distribution is -1. The sizes of the communities 
vary between 10 and 50 nodes (hence the term "small 
communities"). Among the many networks that can be 
generated with these parameters, we chose one contain- 
ing 195 communities of similar sizes (P = 0.98). (hi) 
relaxed caveman [T3] with Pielou's index = 0.75 (RC75): 
Because a more skewed distribution of community sizes 
was required to analyze the behavior of the algorithms 
in a wider range of network structures, we generated a 
network of 512 nodes with P = 0.75, which corresponds 
to a division into 16 communities, each of them including 
from 2 to 196 nodes. In the RC75 configuration, the ini- 
tial network consisted of unconnected communities, each 
one maximally connected internally, i.e., forming a clique, 
(iv) Relaxed caveman, P = 0.50 (RC50): this has an even 
more extreme variation in community sizes. The initial 
network is also comprised of 512 nodes forming 16 cliques, 
but now the largest one contains 354 nodes. Figure [2] 
graphically displays the pattern of connections of each 
of these four initial networks. Once obtained, they were 
progressively modified by increasing C, finally obtaining 
from each one a set of 101 network structures spanning 
the whole range from C = (initial structure present) 
to C = 100 (final structure present). The correspond- 
ing open benchmarks, with the same starting commu- 
nity structures and progressive degradation toward ran- 
domness, were also analyzed following standard methods 
described in previous papers (see, e.g., [13j [14] [19] ) . We 
also discuss below in some detail closed benchmarks with 
random initial structures. 



B. Algorithms 

Two community detection algorithms that have shown 
an excellent performance in recent studies, namely In- 
fomap j8] and SCluster [15], were used in this work. In- 
fomap understands finding the community structure of a 
network as an information compression problem, detect- 
ing communities while compressing the topology of the 



network. It has achieved excellent results on the LFR 
benchmarks [7J [TT] . On the other hand, SCluster uses a 
completely different approach. Using iterative hierarchi- 
cal clustering [15l [24] , the algorithm computes the pair- 
wise distances of the nodes from partial clustering solu- 
tions. Subsequently, it constructs a hierarchical tree from 
which the partition of maximum Surprise |7J is chosen as 
the optimal solution. Surprise is a quality function that 
estimates the goodness of a partition based on the com- 
parison between the graph and the null model generated 
by a random distribution of links [7J [23] . SCluster has 
demonstrated an ability to extract high-quality partitions 
when dealing with networks whose communities strongly 
vary in size [7] I15j . Moreover, as a third way to extract 
the best clustering of the network, we selected from the 
Infomap and SCluster solutions the one with the high- 
est surprise, given that we showed before that surprise 
maximization not only qualitatively outperformed max- 
imizing the most commonly used global index, namely 
Newman and Girvan's Q, but it also improved the solu- 
tions generated by any single algorithm [7J. 



Figure [3] illustrates the results of the three methods 
in our four closed benchmarks. Each partition estimated 
along the conversion process is compared, using the vari- 
ation of information, with both the initial (black circles) 
and the final (red squares) community structures. V = 
means that the partitions compared are exactly the same. 
We previously mentioned how the sum of the variation of 
information from an estimated point to the initial and to 
the final optimal partitions (Vie + Vef) should optimally 
be constant and equal to the V between the initial and 
the final partition (Vjf)- For visualization reasons, half 
of this sum (V = [Vie + Vef]/2) is shown in the figures 
as a dashed line. V = Vjf/2 is expected if the partition 
is optimal. 
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FIG. 3: Variation of information behavior in the four closed benchmarks used in this study. Black circles depict the V between 
the initial and the estimated partition (Vie)- Red (gray) squares show the V between the estimated and the final partition 
(Vef)- V appears as a dashed line, which should follow a straight line if the performance of the algorithm is optimal during 
the whole process of conversion (i.e., Vie + Vef = Vif). 



Results 



The plots show how different is the community detec- 
tion process, depending on both the algorithm applied 
and the topology of the network analyzed. When using 
the GN network as an input, Infomap performs very well 
[Fig. |3ja)]. The variation of information between the ini- 
tial and the estimated partition (Vie, black dots) is zero 
or near zero along the first half of the benchmark. More- 
over, when the conversion (C) breaks the 50% mark, the 

V between the estimated and the final partition (Vef, 
gray squares) behaves in the same way. That is, the al- 
gorithm recognizes the initial structure until C = 49% 
and the final one above C = 51%. This is not the case 
when applying SCluster [Fig. pjTe)], which only recognizes 
the initial partition up to C = 30% and starts recogniz- 
ing the final partition beyond C = 70%. As expected, 

V graphically shows this different quality in the perfor- 
mance of both algorithms. While in the Infomap plot 

V falls in an almost straight line, matching Vif/2, the 
partitions estimated with SCluster produce a significant 
deviation from that line in the interval 30 — 70%, where 
we already detected that the communities were poorly 
estimated. 

When the input of the benchmark is the LFRg net- 
work, Infomap also produces a symmetrical plot, with V 



almost perfectly matching Vif/2 [Fig. [3fb)]. SCluster 
also shows in this case a symmetrical performance, al- 
though with a slight deviation from the optimal values 
[Fig. |3ff)]) i.e., working again worse than Infomap. In 
these first two examples, the sizes of the communities are 
equal or very similar (P sa 1), and they are expected to be 
degraded, on average, at the same time. The original par- 
tition is thus present during the first half of the conver- 
sion (giving Vie ~ 0) , and then the community structure 
suddenly swaps to the final one (and then Vef ~ 0). On 
the other hand, when analyzing networks with a strongly 
skewed distribution of community sizes (RC75, RC50), 
the performance of the algorithms radically changes. In 
the RC75 test, Infomap exhibits a nonsymmetrical be- 
havior [Fig. IJc)], with V > Vi F /2 when C = 40 - 60%. 
On the contrary, SCluster shows a symmetrical pattern 
with V = Vif/2 [Fig. [^g)]. We can see how the V be- 
tween the initial and the estimated partition (Vie , black 
circles) is equal to zero until around the 30%, at which 
point it starts to increase. It is very significant that, in an 
open benchmark (see, e.g., Refs. [T31 EH [H] ) , this would 
be the only available information. Thus we might con- 
clude that from C — 30% on, these two algorithms fail 
to recognize the optimal partition. However, a bad algo- 
rithm performance is not the only explanation for such 
patterns. Alternatively, it is possible that the initial par- 
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FIG. 4: Open benchmarks with starting structures identical to the initial structures of the closed benchmarks shown in Fig. 
[3] These structures are progressively degraded by randomly shuffling links. The percentage of rewired links is indicated on the 
x axis. The dashed line indicates the Vif/2 value of the corresponding closed benchmark. Stars indicate the partitions with 
the highest surprise values. 



tition must not be detected as optimal anymore because 
the community structure has changed. The closed bench- 
marks offer a solid way to check if this latter hypothesis 
is correct. In Figs. [3^c), [3^g), and[3jk), we can see that, 
although Vie soon starts to grow, Vef begins to decrease 
at the same rate. That is, the community structure of 
the initial partition is shifting toward the final one much 
before the C = 50% mark is passed, a pattern that is due 
to the rapid destruction of small communities, typical of 
benchmarks with low P. This behavior was impossible 
to check in any of the benchmarks published so far, al- 
though it is critical for algorithm evaluation. Now, we 
can assert that the behavior of SCluster is optimal, given 
that V follows a straight line: it satisfies the equality in 
Eq. [I] during the whole conversion process. In the last 
case, RC50, the performance of the algorithms follows a 
pattern that is a bit different from the rest of the bench- 
marks. Infomap seems to rapidly collapse, with V moving 
away from the optimal straight line, when C > 10 — 12% 
[Fig. gd)]. In the case of SCluster [Fig. ^h)], V val- 
ues are close to the line quite a bit longer (C around 
30%), but then the algorithm starts recognizing third- 
party structures, far away from both the initial and final 
partitions (V > Vjp/2). These behaviors are due to the 
extremely skewed distribution of community sizes, with a 
very large group that dominates the network [Fig. |2fd)]. 
For these reasons, a quasi-random graph is formed as the 
conversion process of the benchmark approaches 50%. 
Infomap interprets this situation as if most of the net- 
work is included into a single community. Hence, as we 
discussed above, V approximates H{I) (which in this ex- 
ample takes a value of 1.38). SCluster, on the other hand, 
interprets the network structure as including many sin- 
gletons. Therefore, V becomes much larger than Vif/2 
for the reasons previously discussed. 

Figures[3ji)-3(1) show the evolution of each benchmark 
using as the estimated partition that with the highest 
Surprise between the solutions provided by the two algo- 
rithms. As expected [TJ, this approach always selects the 
best partition between those two. The equality in Eq. 
[I] is satisfied all along the first three networks. In the 
fourth case, the pattern is identical to that produced by 
SCluster. The Surprise values of the RC50 benchmark 



suggest that the SCluster interpretation, defining many 
small clusters of the quasi-random intermediate struc- 
ture generated when C > 30%, is preferable to the one 
suggested by Infomap (dominated by a single huge clus- 
ter), in good agreement with the fact that SCluster is, as 
already indicated above, performing better in this bench- 
mark than Infomap in the adjacent conversion range 
(30% > C > 12%). For comparative purposes, we also 
generated the corresponding open benchmarks, which 
start with the same structures as those of our closed 
benchmarks but are then progressively degraded toward 
undetermined, random structures by rewiring their links 
[131 HH [Hj • Figure [4] shows the variation of information 
between the original partition and those obtained by the 
SCluster and Infomap algorithms. The partition with 
maximum Surprise is marked with a star. We have also 
depicted in Fig. [4] the value of V in the corresponding 
closed benchmarks (dashed line). As found before in re- 
lated cases [7] , neither of the two algorithms is the best in 
all situations. If we use the surprise values as a guide, it 
can be seen that SCluster improves upon Infomap when 
degradation is very high and systematically in the bench- 
marks with the lowest Pielou's indices (RC75 and RC50), 
while Infomap works better when degradation is low and 
Pielou's index is high (see GN and LFRg benchmarks). 
This situation is fundamentally caused by Infomap solu- 
tions often consisting in single communities (this happens 
in all the cases shown in Fig. |4j in which the Infomap 
V values are above the V dashed lines). These results 
for the open benchmarks are fully compatible with those 
shown in Fig. [3] for closed benchmarks. 

The comparison of the values of V in Figs. |]and 
enables us to precisely understand the relationships be- 
tween both types of benchmarks. Looking at the dashed 
lines in those figures allows us to estimate the approx- 
imate difficulty of reconstructing the community struc- 
ture present in the closed benchmarks when compared 
with the open ones. Thus, we can see that C = 50% 
in the GN closed benchmark corresponds to a rewiring 
percentage of more than 40% in the corresponding open 
GN benchmark, while C = 50% in the LFR S , RC75, and 
RC50 closed benchmarks may correspond, respectively, 
to rewiring about 80%, 60%, and (this can be ascertained 
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FIG. 5: Random networks with the same number of nodes as the corresponding closed benchmarks indicated on top. As in 
Fig. [3J the dashed line corresponds to the V value, while red (gray) dots correspond to Vef values and black squares to Vie 
values. The values of Vie, Vef, and V largely/fully coincide in Infomap analyses, appearing as a single line or close parallel 
lines. Notice that as soon as conversion starts, Vie + Vef 3> Vif- Differences between Infomap and SCluster are due to the 
different way they interpret the random structures present, i.e., as a single cluster (Infomap) or as many individual clusters 
(SCluster). 



less precisely) 50-70% of the links in the corresponding 
open benchmark. Thus, the GN, LFR5, and RC75 closed 
benchmarks always have a substantial level of structure, 
which explains the good fit to the V value observed in 
Fig. [3] Random networks can also be used as start- 
ing points for a closed benchmark. The comparison with 
these random network-based benchmarks may contribute 
to determine whether or not a given network has a statis- 
tically significant community structure, a topic that has 
recently received some attention j25J US] . To address this 
issue, we generated four types of random graphs, each 
of them having the same number of nodes and edges as 
one of the initial networks described above (GN, LFR5, 
RC75, and RC50), but randomly distributed. Given that, 
for generating a closed benchmark, a community struc- 
ture must be assumed a priori, Infomap and SCluster 
were tested in those random networks and the community 
structure with the highest surprise value was selected. 
Figure [5] shows the results of closed benchmarks gener- 
ated using the four random networks. As occurred above, 
Infomap returns partitions in which all nodes [Figs.j5[a)- 
5(c)], or at least more than 90% of the nodes [Fig. loja)], 
belong to one community. The V observed is the entropy 
of the initial (or final) partition H(I) = H(F), given 
that, if all nodes are in a single community, H(E) = 0. 
On the other hand, SCluster generates solutions with a 
high number of communities [Figs. pjfe)-5(h.)], interpret- 
ing that even a random graph contains a certain degree 
of community structure. In these random graph bench- 
marks, an interesting point is to appreciate the extremely 



fast degradation of the partitions when only 1% of the 
links have been rewired (Fig. [5|. When compared with 
its analogous nonrandom network, Vie rises instanta- 
neously, which is the behavior expected for networks in 
which communities are barely defined. This kind of com- 
parison between variation of information patterns may 
enable us to evaluate the robustness of a network, simi- 
larly to what has been done using other methods |25j . 



D. DISCUSSION 

The development of methods that can accurately de- 
tect community structure in networks is critical in many 
scientific fields, since they can reveal deep underlying re- 
lationships among the elements of a system. Therefore, it 
is very important to compare and evaluate such methods 
against a set of synthetic benchmarks in order to select 
one method, or a combination of methods, that can pro- 
duce reliable results when analyzing real-world networks. 
Several standard benchmarks for testing community de- 
tection algorithms have been proposed, most of them 
of the class we called open: they start with a network 
of well-defined community structure and then the struc- 
ture is degraded by randomly rewiring links [131 1141 119) . 
During this process, the communities gradually disap- 
pear toward an "open end" when the precise community 
structure is undetermined. This type of benchmark is 
useful for comparing the relative performance of algo- 
rithms but inadequate for assessing their intrinsic quality 
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(i.e., whether the solutions provided are optimal or not). 
In this paper, we have fully described the closed bench- 
marks, which also degrade an initial network with defined 
communities, but this time evolving toward a second, 
known network structure. This evolution is produced by 
a directed rewiring of the links from the initial to the 
final network, and it enables us to control the progres- 
sion of the structure between both ends. We have also 
shown that the variation of information provides valu- 
able information about the goodness of a partition and 
its possible optimality: the configuration of our closed 
benchmarks allows us to lower bound the expected V 
value using the triangle inequality that the metric must 
satisfy. Another relevant improvement over the available 
open benchmarks is the fact that any network can be 
used as input for the degradation process, enabling us 
to carry out extensive studies over a wide variety of net- 
work topologies. These features clearly represent qualita- 
tive improvements over the benchmarks published so far. 
The comparisons of open and closed benchmarks, or of 
networks of known structure and random networks (Figs. 



|4]and|5}, are also interesting ways to further develop this 
methodology. 

As we have shown, there may be scenarios with very 
skewed distributions of community sizes, such as the 
RC50 network (Fig. [3| , where the equality in Eq. [T] is 
not satisfied during the whole process of conversion. Nev- 
ertheless, this behavior in such extreme networks does 
not diminish the potential of our approach because, even 
then, there are several conditions that a good algorithm 
must fulfill. First, when 50% of the links have been 
rewired, Vie must be, on average, equal to Vef- Sec- 
ond, the initial partition has to be recognized better than 
the final one during the first half of the benchmark and, 
from there on, the behavior should be exactly the oppo- 
site. Third, a good algorithm will provide solutions with 
Vie + Vef = Vif along a longer range of the conver- 
sion process than a bad one. In summary, the properties 
of the closed benchmarks make them highly valuable for 
the development and evaluation of computational meth- 
ods to effectively characterize the community structure 
of a network. 
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